# ChatBox
* PROOF OF CONCEPT * PROOF OF CONCEPT * PROOF OF CONCEPT * PROOF OF CONCEPT *

## Introduction

Steganographic use of Arithmetic Coding with Stochastic Ordered CFG.

Like this:

encode: `01101110100101` -> `(()(())(())())`

decode: `(()(())(())())` -> `01101110100101`

Similar to, and inspired by http://www.nic.funet.fi/pub/crypt/old/mimic/mimic.text

Beware: experimental. The crypto has been reviewed only once and should receive more attention. However, the methodology appears to be sound and the program should now be fast and robust enough for practical use.

## Explainer

Choose an unambiguous context-free grammar. Add weights and order the productions for each nonterminal.
For example, if S -> α | β for some α and β, we may choose S -> 1/4 α | 3/4 β.

Now to encode a binary string, think of it as the binary expansion r = 0,01110101... (say) of a number between 0 and 1.

Starting from the sentential form S (start symbol), go through the productions of the leftmost nonterminal (S), compare r to the cumulative weights of the rules. Because there are only two productions in the example, there would be only one comparison. Since r > 1/4, we proceed by replacing S with the string α and consider the first nonterminal of α and its weighted list of productions, scaling their cumulative weights to the range [1/4 .. 1].

This short explanation misses many important details, but that's the gist of it. We're working on a complete, mathematically rigorous description. It will appear in this repo at some point.

## Features

Stochastic Ordered CFGs can be specified in a somewhat intuitive YAML format. See examples/ folder for some examples. The nonterminals are numbered, 0 is always the start symbol.

The ChatBox binary can validate grammars for well-formedness and several other important properties, but not for unambiguity. That's undecidable, unfortunately, and it's up to the user to verify.

The ChatBox binary can be given a grammar and a shared secret for cryptographic message encapsulation and run in --encode or --decode mode. One can then pipe messages into it and obtain sentential forms encoding the messages in --encode mode, or pipe sentential forms and obtain the original messages in --decode mode. If the grammar is prefixfree, you may use the --prefixfree option and just input the sentential forms (i.e. encoded messages that you receive from the encoder at the other end) one after the other, otherwise they must be externally delimited by always prefixing their lengths.

## Build

install rust nightly (uses "experimental" feature: trait aliases)

`cargo test --features linalg`

try

 `cargo run -- --grammar examples/easy1.yaml  --validate`

 `cargo run -- --grammar examples/easy1.yaml  --encode | cargo run -- --grammar examples/easy1.yaml --decode`

